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different illumination regions, less textured regions and different 
environmental exposure. 
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1. INTRODUCTION 

Computer vision is a significant research topic in the computer system due to increased usage of the 
autonomous navigation system, industrial monitoring, unmanned driving, virtual reality, image-based 
rendering and vision-based object handling. All the usage required accurate vision system in order to ensure 
the system works as designed. One of the aimed of computer vision is to provide an accurate image 
information and to reform its properties in terms of its shape, illumination, and colour distribution. Stereo 
vision is one of the main areas of research in the computer vision field. It is the technology inspired by 
human eyes where this system consists of two cameras captured a scene simultaneously by the cameras and 
then processed the images to get the information about the distance between the object to the images which 
known as depth value. Stereo matching is the common method used to determine the depth information about 
the stereo images. This method is used in order to determine the disparity value of all pixels in stereo images 
[1]. By calculating the differences of pixel value at the left image and the right image at two corresponding 
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points, the disparity value of both images is determined. The pre-processing algorithm is applied to the 
images for image rectification in order to find the disparity value easily [2]. 

The process of determining the depth value of the stereo images are depending on how the algorithm 
is designed to compute the disparity value. The process can be classified into two major categories which is 
global method and local method [3]. By using global method, the disparity values are computed by 
optimizing a general energy function for all pixels of an image [4]. This method gives a higher accuracy, but 
it also required high computational complexity [5]. On the other hand, local method which calculated the cost 
volume in a limited area within the image gives lower accuracy in stereo matching compared to global 
method. Because of less complexity in computational steps, local methods are used in most real time 
application of stereo system [6]. Scharstein et al. [3] proposed four steps of the stereo matching algorithm 
which consist of matching cost computation, cost aggregation, disparity optimization and disparity 
refinement. The challenging problem in stereo matching algorithm is to increase the accuracy of the depth 
value. The correct matching at low texture region become one of the challenging in getting an accurate 
disparity map since the possibility of matching with invalid correspondence point is high. The illumination 
difference between the image pair also will affect the accuracy of the disparity map generate by the matching 
algorithm. Due to this issue, this paper proposed a framework of stereo matching algorithm which aim to 
improve the accuracy of the depth value of the less textured region and on the image with illumination 
differences. In recent years, there are varieties of stereo matching algorithm had been proposed by 
researchers to increase the accuracy of the stereo matching algorithm. Zhang et al. [7] had proposed cross- 
based stereo matching which gave an accurate disparity estimation compared with global method. Yang et al. 
[8] proposed a simple local matching algorithm which efficient to be implemented. 

The most common matching cost computation such as the sum of absolute different, sum of squared 
difference (SSD) and normalized cross correlation (NCC) are the traditional similarity measure function for 
stereo matching using the block matching method. However, this method is very sensitive to the amplitude 
distortion which leading to low accuracy [9]. A non-parametric local transform had been introduced which 
consist of census transform (CT) and rank transform (RT) [10]. Both methods are more resistant to 
radiometric distortion because they depend on the relative order of pixel intensity instead of the intensity 
values themselves [11]. Therefore, both types of non-parametric local transforms able to cope with the 
matching uncertainties well for image regions with similar colors, while SAD and SSD may cope well with 
the image regions with similar local structures [12]. Due to this factor, researchers started to introduce the 
combination of matching cost computation. The combination of CT with other cost matching method gain the 
researcher attention from the performance of the combined matching cost showed better results. Work done 
by Lee et al.[11] proposed combination of CT with gradient distance as cost computation method. Zhu et al. 
[13] also used combination of CT with gradient as matching cost method. Wang et al. [14] used combination 
of CT and absolute difference as cost computation method. The combination of SAD algorithm and gradient 
matching in cost computation also produced better accuracy of disparity value [15]. 

Cost aggregation is the second step in stereo matching that directly influence the overall efficiency 
and the accuracy of the matching algorithm. In most local matching algorithm, filtering based cost 
aggregation methods are commonly adopted in this process where the process of filtering the matching cost 
volume happened. The simplest approach that can be used such as the Box filter and the Gaussian filter 
where the aggregation is only in the fixed window size [13]. However, these filtering techniques yield a bad 
performance of the disparity map with fatten edges. In local methods, guided filter (GF) and bilateral filter 
(BF) become popular edge-preserving filters used as a cost aggregation method since both methods able to 
produce good quality of disparity map with fine edges and also produce disparity map with better accuracy 
[16]. GF has gained better performance and efficiency compared with BF since the complexity of GF 
algorithm is lower compared to BF. GF become a popular approach used by researchers in order to develop 
an accurate stereo matching algorithm [17]. The study done by Hosni et al. [18] showed that GF is robust to 
be applied in the stereo matching area. There are various cost aggregation method had been applied in stereo 
matching algorithm based on that study. Weighted GF had been proposed by Kong et al. [19] yield a better 
performance of GF. Cost volume filtering method which proposed by Hong and Kim [20] resulted an 
improved disparity map. The adaptive weight of the local variable is used to control the linear coefficient 
based on the local texture features. Wu et al.[21] proposed to combine GF with minimum spanning tree 
(MST) filter which able to increase the robustness of highly textured and texture less regions. Hamzah et al. 
[22] introduced an adaptive support weight based on iterative GF. Moreover, Zhu and Chang [23] proposed 
an innovative weighted-combination scheme of GF model which improved matching cost volume. In this 
paper GF is adopted as cost aggregation method which aimed to produce a disparity map with fine edge and 
able to improve the accuracy of the results. 

This paper proposed a stereo matching algorithm that uses a combination of two measurement 
methods in cost computation steps which are SAD and CT. The SAD cost computation which is very 
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sensitive to the amplitude distortion and illumination difference and by combining with CT cost function, 
higher accuracy and robust cost function was developed. GF is selected as a filter-based cost aggregation 
method that aims to improve the quality of the depth map at the edges and the discontinuity zone. Then the 
process of optimization used the winner-take-all (WTA) strategy. At this point, each valid pixel is absorbed 
based on the lowest aggregated corresponding value. However, certain invalid pixels are always present in 
the less texture areas. The final stages consist of post-processing works which used weighted median (WM) 
filter in order to obtain a more accurate disparity map. 


2. RESEARCH METHOD 
The SAD method has been used in order to measure the intensity difference between left and right 
images. The measurement of SAD has been implemented using (1): 


SAD(p,d) = Does (1i@) -i - a) a) 


where the pixel at coordinates (x; y) denote by p, i means the RGB channels number, d is the disparity value, 
I, represents the left image and 1, represents the right image. CT process maps the neighbouring surrounding 
pixel to a bit string which can denote the intensity value of the neighbouring pixel [10]. The process of CT 
are based on the (2): 


CT(p) = ®aewer cen(p, q) (2) 


where p and q represent the target pixel and neighboring pixels respectively and ® refer to the eXclusive OR 
(XOR) process of the bit value for cen(p,q) with wer, window size and cen(p,q) represent the binary 
function with the condition as given by (3): 


1,1(p) = 1(q) 
0, otherwise 


cen(p,q) = { (3) 


where I(p) and I(q) are the target pixel and neighboring pixels values respectively. By using the CT process 
implemented as (2), the cost volume at each corresponding pixel is calculated by calculating the different 
between two bit strings which is given by (4): 


CT'(p,d) = HD(CT,(p) — CT,(p — d)) (4) 


where CT, is the bit string obtained from the CT process of the left images and CT, is the bit string obtained 
from the CT process of the right images. The integrated of the two matching cost is based on the normalized 
cost function proposed by [11], which represent in (5): 


M.(p,d) = 2 — ex p(-SAD(p, d)) — exp(-CT' (p, d)) (5) 


In this work, the initial matching cost was aggregated using guided filter (GF) in order to ensure the 
noise can be removed from the cost and the edge of can be preserved [13]. This filtering method used a 
reference image as a guidance during filtering process and for this work, the left grayscale image is used as 
the reference image. The filter kernel of the guided filter [17] is defined as (6): 


1 Cd „n—1— Hkn- dU n—-1—Hkn— ) 
GF yqlIn) = r aew (1 a Coat Henna Hena) (6) 


2 
Okn-1tE 


where I, represents the reference grayscale image, p is the coordinates for target pixel, (x; y). Wg is the 
window size with the size of r x r pixels. s is the sum of pixels in the wẹ, q is the neighboring pixel and k is 
the center pixel. The variance and the average of the intensity values in the reference image represent by o 
and u. The € is the smoothness term control element. At this step the matching cost is aggregated and the 
total cost, CA(p, d) is define as (7): 


CA(p,d) = M.(p, d)GFyq Un) (7) 
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After completing the matching aggregation, winner-takes-all (WTA) strategy was implemented at 
this step in order to select the minimum matching cost as the initial disparity value [3]. The initial 
disparity, d; is define as (8): 


di = arg minger CA(p, d) (8) 


where R is all the possible disparity value. Lastly, the weighted median filter of is used to refine the map of 
disparities [22]. Let D indicate the resulting disparity map after adjusting the normal base plane. The final 
disparity map,D¥ is refined as (9). 


DF (p) = med {di} (9) 


3. EXPERIMENTAL RESULTS AND DISCUSSION 

The experiment was conducted to assess and determine the performance of the proposed algorithm 
from various perspectives. The results were evaluated using Middlebury dataset 2014 which consists 15 
training images [24]. The absolute disparity error in non-occluded area, nonocc and entire image area, all 
pixels had been measured using online Middlebury evaluation system. The performance comparison between 
the proposed matching cost and other three matching cost was evaluated. Tabel 1 shows the average error by 
using four different matching cost computation method with 15 Middlebury dataset training images. Figure 
1(a) shows all the data for absolute average error (nonocc) for all training images and Figure 1(b) shows data 
for all pixel absolute average error (all). The combination of SAD and CT give the lowest percentage of error 
which 8.37% and 17.8% of average error for nonocc and all pixels respectively. While CT gives 9.45% for 
nonocc and 19% for all pixels. Combination of CT and Gradient matching cost give 9.16% and 18.6% of 
average error for nonocc and all and the matching cost using gradient methods give 9.97% and 19.4%. Based 
on this data, it shows that the combination of SAD and CT gives better accuracy compared with other three 
matching cost computation method. 

In cost aggregation step, the evaluation was done by comparing the performance of the GF with BF 
and BOX filter using Middlebury training dataset. Figure 1 shows the results of the comparison between GF 
and other two CA methods based on the average absolute error and percentage of bad pixels. The results 
show that the GF provided higher accuracy compared to BOX. However the performance of GF compared to 
BF is almost the same, but still gave better accuracy. Figure 2 shows the example of the disparity map 
produced from the evaluated algorithm. Smooth disparity map obtained when GF and BF used as CA 
methods. The edges are well preserved when using GF compared with the disparity map obtained by using 
BOX which produced fatten edge. The results shows that the combination of SAD and CT at matching cost is 
worked better by combining with the GF compared to other aggregation method. 

Figure 3 shows the samples of left and right input images of ArtL and PianoL taken from 
Middlebury dataset. Both sets of images have different illumination between left image and right images 
which cause very challenging to be matched since the pixel amplitude at the same corresponding point is 
totally different. However, the proposed algorithm manages to discover the corresponding point. The overall 
results show that the proposed algorithm can generate a disparity map which is better and competitive with 
the other established framework. Figure 3 also shows the comparison of the disparity map produced by using 
proposed algorithm and other framework. The image of PianoL also has an area on the floor, which 
considered as textureless region and the results in Figure 3 shows that the smooth disparity map obtained at 
that area by using the proposed algorithm compared to other framework. Based on the disparity map of 
PianoL it can see that the proposed algorithm also has the ability to produce a minimum error at less texture 
region with different illumination. 

Table 2 shows the quantitative evaluation of the absolute error for the Middlebury Dataset based on 
proposed algorithm compared with other frameworks. R-NCC, BSM and proposed method used traditional 
stereo matching algorithm framework while DSGCA, DoGGuided and DF are using artificial intelligence 
frameworks. The overall result shows that the proposed algorithm is the lowest percentage of average 
disparity error for non-occluded region and all pixels. The results also show that the absolute error produced 
by using proposed algorithm is the lowest compared to other framework for images with less texture such as 
Adiron, Recycle and Playroom. The proposed algorithm also gives the best accuracy in the image with 
different exposure such as MotorcycleE. The overall results also show that the proposed method gives better 
accuracy compared with another method, including framework using an artificial intelligence-based 
framework. 
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Table 1. Comparison of percentage of average error between four types of matching cost computation for 
non-occluded and all pixels region. 


Type of MCC SAD +CT CT GRAD CT+GRAD 
Middlebury nonocc all nonocc___all___nonocc all nonocc all 
Avg 8.37 17.8 9.45 19 9.97 19.4 9.16 18.6 
Adiron 2.98 7.41 3.14 7.84 15 19.6 9.74 14.3 
ArtL 5.46 22.2 6.19 23.4 7.82 24.7 8.57 253 
Jadepl 16.9 43.5 17.5 44.1 18.7 44.9 16.7 41.7 
Motor 3.47 11.1 3.78 11.5 3.8 11.3 3.85 11.6 
MotorE 3.37 11 3.67 11.4 8.46 16.6 5.91 13.9 
Piano 5.82 10.6 6.68 11.6 7.33 11.9 7.56 12.3 
PianoL 20.3 24.1 20.8 24.9 34.1 38.1 30.4 34.4 
Pipes 7.47 20.1 7.82 20.7 7.45 20.4 7.78 20.6 
Playrm 6.23 23 6.83 23.9 11.7 27.2 9.01 24.1 
Playt 31.7 36.3 35.8 40.2 18.2 24.2 23.3 29.3 
PlaytP 7.44 14.2 12.4 19 3.8 10.9 5.61 13.2 
Recyc 3,79 7.95 4.16 8.36 4.48 8.06 4.01 8.48 
Shelvs 12.7 15.7 13.3 16.3 16.2 18.9 14.3 17.4 
Teddy 2.72 12.4 3.07 13 3.02 11.5 2.82 11.8 
Vintge 19.5 25.7 22.7 28.7 9.42 16.7 7 13.7 
Comparison of Different CA Method 
W BOX H BF BGF 


Percentage of Error (%) 
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average error (nonocc) 


average error (all) 


bad pixel (nonocc) 


Type of Error 


bad pixel (all) 


Figure 1. Percentage of average absolute error and average of bad pixels for non-occluded and all pixels by 


using GF, BF and BOX as cost aggregation method 
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Step 1+ 


BOX Filter 


E Adirondack 


Teddy 
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Figure 2. Example of disparity map using GF, and BOX Filter 
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Disparity Map using Proposed Disparity Map using Other 


Method Left Image Right Image Framework Framework 


ArtL 


PianoL 


Figure 3. Sample of left and right input image from middlebury dataset and disparity map by using proposed 
algorithm and other framework for input image with different illumination 


Table 2. The results of the quantitative evaluation of absolute error for all pixels and non occluded region 
using Middlebury dataset 


Method R-NCC DSGCA BSM DoGGuided MANE DF Proposed 
[25] [26] [27] [28] [29] [30] 
% % % % % % % 
non all non all non all non all non all non all non all 
occ occ occ occ occ occ occ 
Adiron 20.5 21.2 3.25 7.68 7.27 12.7 15.2 20.1 6.58 11.6 13.2 14.1 2.98 7.41 
ArtL 10 12.5 5.95 21.7 11.4 28.7 9.57 28 5.81 22.9 16.4 18.2 5.46 22.2 


Jadepl 67.2 91 18.9 45 30.5 58.7 27.1 56.5 20.7 45.9 77.8 103 16.9 43.5 
Motor 9.59 11.5 3.6 10.6 6.67 14.8 5.64 13.8 4.52 12.4 11.2 13.2 3.47 11.1 
MotorE 10.6 12.7 3.41 10.4 6.52 14.7 8.31 16.8 4.31 12.3 10.7 12.7 3.37 11 

Piano 9.12 9.59 717 11.5 10.8 16 8.09 13.4 10.6 15.1 10.5 11.1 5.82 10.6 
PianoL 15.8 15.8 21.1 24.5 32.1 35.8 324 37.3 20.9 24.7 26.4 264 20.3 24.1 
Pipes 21.8 27.9 7.23 19.9 10.5 24.5 9.67 23.8 8.62 22.3 16.1 22.5 7.47 20.1 


Playrm 29 30 9.36 24.6 12.5 29.4 14 30.3 15 31.1 19.6 20.9 6.23 23 

Playt 18 17.5 29.4 34.5 24.4 31 24.5 30.8 34.7 39.9 13.3 13.9 31.7 36.3 
PlaytP 13.1 13 7.94 14.8 12.8 20.2 5.32 13 10.5 17.3 14.8 16.3 7.44 14.2 
Recyc 22.3 22.2 3.8 7.56 7.42 12.1 5.56 9.13 5:5 9.67 16.2 16.8 3.79 7.95 
Shelvs 11.5 11.7 14.7 17.3 16.4 19.2 16.2 19 20.2 22.5 11.1 11.5 12.7 15.7 
Teddy 4.13 4.81 3.51 12.2 4.88 14.3 4.15 13.4 3.12 12.5 5.04 6.16 2.72 12.4 
Vintage 44.3 45.1 39.7 43.8 32.8 39.3 15 23.6 46.5 51 24.9 26.8 19.5 25.7 


Average 19.8 22.9 9.75 18.7 13.4 23.5 12 22.3 11.9 21.3 19.2 22.7 8.37 17.8 


4. CONCLUSION 

In this work, we proposed a local stereo matching algorithm which consists of the combination of 
SAD and CT in matching cost computation, GF at cost aggregation, WTA at disparity optimization and 
Median Filter (MF) at post processing. These combination steps are able to reduce the average absolute error 
from 10% to 19% compared with other matching algorithms. Besides, by using proposed matching cost, 
lowest absolute error is obtained compared with other methods including the disparity map of image pairs 
with different illuminations and exposure. The accuracy of the image with less texture also increased by 
using the proposed algorithm. Based on the results, it summarizes that the local stereo matching algorithm 
using the proposed method is able to reduce the matching error in less texture region and at different 
illumination region and also well preserved the edge of the image. 
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